- Language modeling
- Feed-forward neural networks
- Recurrent neural networks
A Language Model is a statistical tool that uses algorithms to predict the next word or sequence of words in a sentence. These models are fundamental in natural language processing (NLP) and are used to understand and generate human language.
Key Points:
Types of Models:
A machine learning subfield of learning representations of data. Exceptional effective at learning patterns.
Deep learning algorithms attempt to learn (multiple levels of) representation by using a hierarchy of multiple layers.
\[h = \sigma(W_1x + b_1)\] \[y = \sigma(W_2h + b_2)\]
Optimize
objective/cost function \(J\)\((\theta)\)
Generate
error signal that measures difference between predictions and target values
Use error signal to change the
weights and get more accurate predictions
Subtracting a fraction of the gradient moves you towards the (local) minimum of the cost function
objective/cost function \(J\)\((\theta)\)
Update each element of \(\theta\):
\[\theta^{new}_j = \theta^{old}_j - \alpha \frac{d}{\theta^{old}_j} J(\theta)\]
Matrix notation for all parameters ( \(\alpha\): learning rate):
\[\theta^{new}_j = \theta^{old}_j - \alpha \nabla _{\theta}J(\theta)\]
Recursively apply chain rule though each node
Learned hypothesis may fit the training data very well, even outliers ( noise) but fail to generalize to new examples (test data)
How to avoid overfitting?
–>
–> –> –>
–>
–> –>
–> –> –>
–>
–>
–> –> –>
–>
–> –> –> –> –> –>
–>
–> –> –>
–> –> –> –> –> –> –> –>
\[z_t = \sigma(W_z \cdot [h_{t-1}, x_t])\] \[r_t = \sigma(W_r \cdot [h_{t-1}, x_t])\] \[\tilde{h}_t = tanh(W \cdot [r_t * h_{t-1}, x_t])\] \[h_t = (1 - z_t) * h_{t-1} + z_t * \tilde(h)_t\]
Language Model: A system that predicts the next word
Deep learning can be applied for automatic feature engineering
Recurrent Neural Network: A family of deep learning / neural networks that: • Take sequential input (Text) of any length; apply the same weights on each step • Can optionally produce output on each step
Recurrent Neural Network ≠ Language Model
RNNs can be used for many other things
Language modeling can be done with different models, e.g., n-grams or transformers: GPT is an LM!